active manifold
Local Linear Convergence of Forward--Backward under Partial Smoothness
In this paper, we consider the Forward--Backward proximal splitting algorithm to minimize the sum of two proper closed convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relatively to an active manifold $\mathcal{M}$. We propose a generic framework in which we show that the Forward--Backward (i) correctly identifies the active manifold $\mathcal{M}$ in a finite number of iterations, and then (ii) enters a local linear convergence regime that we characterize precisely. This gives a grounded and unified explanation to the typical behaviour that has been observed numerically for many problems encompassed in our framework, including the Lasso, the group Lasso, the fused Lasso and the nuclear norm regularization to name a few. These results may have numerous applications including in signal/image processing processing, sparse recovery and machine learning.
Escaping Saddle Points for Nonsmooth Weakly Convex Functions via Perturbed Proximal Algorithms
We propose perturbed proximal algorithms that can provably escape strict saddles for nonsmooth weakly convex functions. The main results are based on a novel characterization of $ε$-approximate local minimum for nonsmooth functions, and recent developments on perturbed gradient methods for escaping saddle points for smooth problems. Specifically, we show that under standard assumptions, the perturbed proximal point, perturbed proximal gradient and perturbed proximal linear algorithms find $ε$-approximate local minimum for nonsmooth weakly convex functions in $O(ε^{-2}\log(d)^4)$ iterations, where $d$ is the dimension of the problem.
- Asia > Middle East > Jordan (0.05)
- North America > United States > California > Yolo County > Davis (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Local Linear Convergence of Forward--Backward under Partial Smoothness
Jingwei Liang, Jalal Fadili, Gabriel Peyré
In this paper, we consider the Forward-Backward proximal splitting algorithm to minimize the sum of two proper closed convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relative to an active manifold M. We propose a generic framework under which we show that the Forward-Backward (i) correctly identifies the active manifold M in a finite number of iterations, and then (ii) enters a local linear convergence regime that we characterize precisely. This gives a grounded and unified explanation to the typical behaviour that has been observed numerically for many problems encompassed in our framework, including the Lasso, the group Lasso, the fused Lasso and the nuclear norm regularization to name a few. These results may have numerous applications including in signal/image processing processing, sparse recovery and machine learning.
Online Covariance Estimation in Nonsmooth Stochastic Approximation
Jiang, Liwei, Roy, Abhishek, Balasubramanian, Krishna, Davis, Damek, Drusvyatskiy, Dmitriy, Na, Sen
We consider applying stochastic approximation (SA) methods to solve nonsmooth variational inclusion problems. Existing studies have shown that the averaged iterates of SA methods exhibit asymptotic normality, with an optimal limiting covariance matrix in the local minimax sense of H\'ajek and Le Cam. However, no methods have been proposed to estimate this covariance matrix in a nonsmooth and potentially non-monotone (nonconvex) setting. In this paper, we study an online batch-means covariance matrix estimator introduced in Zhu et al.(2023). The estimator groups the SA iterates appropriately and computes the sample covariance among batches as an estimate of the limiting covariance. Its construction does not require prior knowledge of the total sample size, and updates can be performed recursively as new data arrives. We establish that, as long as the batch size sequence is properly specified (depending on the stepsize sequence), the estimator achieves a convergence rate of order $O(\sqrt{d}n^{-1/8+\varepsilon})$ for any $\varepsilon>0$, where $d$ and $n$ denote the problem dimensionality and the number of iterations (or samples) used. Although the problem is nonsmooth and potentially non-monotone (nonconvex), our convergence rate matches the best-known rate for covariance estimation methods using only first-order information in smooth and strongly-convex settings. The consistency of this covariance estimator enables asymptotically valid statistical inference, including constructing confidence intervals and performing hypothesis testing.
- North America > United States > Texas > Brazos County > College Station (0.14)
- North America > United States > California > Yolo County > Davis (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- (3 more...)
Anderson Acceleration in Nonsmooth Problems: Local Convergence via Active Manifold Identification
Li, Kexin, Bai, Luwei, Wang, Xiao, Wang, Hao
Anderson acceleration is an effective technique for enhancing the efficiency of fixed-point iterations; however, analyzing its convergence in nonsmooth settings presents significant challenges. In this paper, we investigate a class of nonsmooth optimization algorithms characterized by the active manifold identification property. This class includes a diverse array of methods such as the proximal point method, proximal gradient method, proximal linear method, proximal coordinate descent method, Douglas-Rachford splitting (or the alternating direction method of multipliers), and the iteratively reweighted $\ell_1$ method, among others. Under the assumption that the optimization problem possesses an active manifold at a stationary point, we establish a local R-linear convergence rate for the Anderson-accelerated algorithm. Our extensive numerical experiments further highlight the robust performance of the proposed Anderson-accelerated methods.
Local Linear Convergence of Forward-Backward under Partial Smoothness
In this paper, we consider the Forward-Backward proximal splitting algorithm to minimize the sum of two proper closed convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relative to an active manifold M. We propose a generic framework under which we show that the Forward-Backward (i) correctly identifies the active manifold M in a finite number of iterations, and then (ii) enters a local linear convergence regime that we characterize precisely. This gives a grounded and unified explanation to the typical behaviour that has been observed numerically for many problems encompassed in our framework, including the Lasso, the group Lasso, the fused Lasso and the nuclear norm regularization to name a few. These results may have numerous applications including in signal/image processing processing, sparse recovery and machine learning.
Asymptotic normality and optimality in nonsmooth stochastic approximation
Davis, Damek, Drusvyatskiy, Dmitriy, Jiang, Liwei
Polyak and Juditsky [30] famously showed that the stochastic gradient method for minimizing smooth and strongly convex functions enjoys a central limit theorem: the error between the running average of the iterates and the minimizer, normalized by the square root of the iteration counter, converges to a normal random vector. Moreover, the covariance matrix of the limiting distribution is in a precise sense "optimal" among any estimation procedure. A long standing open question is whether similar guarantees - asymptotic normality and optimality - exist for nonsmooth optimization and, more generally, for equilibrium problems. In this work, we obtain such guarantees under mild conditions that hold both in concrete circumstances (e.g.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization
Davis, Damek, Drusvyatskiy, Dmitriy, Jiang, Liwei
We show that the subgradient method converges only to local minimizers when applied to generic Lipschitz continuous and subdifferentially regular functions that are definable in an o-minimal structure. At a high level, the argument we present is appealingly transparent: we interpret the nonsmooth dynamics as an approximate Riemannian gradient method on a certain distinguished submanifold that captures the nonsmooth activity of the function. In the process, we develop new regularity conditions in nonsmooth analysis that parallel the stratification conditions of Whitney, Kuo, and Verdier and extend stochastic processes techniques of Pemantle.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)
Training Structured Neural Networks Through Manifold Identification and Variance Reduction
Huang, Zih-Syuan, Lee, Ching-pei
This paper proposes an algorithm, RMDA, for training neural networks (NNs) with a regularization term for promoting desired structures. RMDA does not incur computation additional to proximal SGD with momentum, and achieves variance reduction without requiring the objective function to be of the finite-sum form. Through the tool of manifold identification from nonlinear optimization, we prove that after a finite number of iterations, all iterates of RMDA possess a desired structure identical to that induced by the regularizer at the stationary point of asymptotic convergence, even in the presence of engineering tricks like data augmentation and dropout that complicate the training process. Experiments on training NNs with structured sparsity confirm that variance reduction is necessary for such an identification, and show that RMDA thus significantly outperforms existing methods for this task. For unstructured sparsity, RMDA also outperforms a state-of-the-art pruning method, validating the benefits of training structured NNs through regularization.
Stochastic Subgradient Descent on a Generic Definable Function Converges to a Minimizer
It was previously shown by Davis and Drusvyatskiy that every Clarke critical point of a generic, semialgebraic (and more generally definable in an o-minimal structure), weakly convex function is lying on an active manifold and is either a local minimum or an active strict saddle. In the first part of this work, we show that when the weak convexity assumption fails a third type of point appears: a sharply repulsive critical point. Moreover, we show that the corresponding active manifolds satisfy the Verdier and the angle conditions which were introduced by us in our previous work. In the second part of this work, we show that, under a density-like assumption on the perturbation sequence, the stochastic subgradient descent (SGD) avoids sharply repulsive critical points with probability one. We show that such a density-like assumption could be obtained upon adding a small random perturbation (e.g. a nondegenerate Gaussian) at each iteration of the algorithm. These results, combined with our previous work on the avoidance of active strict saddles, show that the SGD on a generic definable (e.g. semialgebraic) function converges to a local minimum.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Illinois (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)